Multiple genetic lineages of anadromous migratory Mekong catfish Pangasius krempfi revealed by mtDNA control region and cytochrome b

Abstract Population genetic structure of migratory fishes can reflect ecological and evolutionary processes. Pangasius krempfi is a critically important anadromous catfish in the Mekong River, and its migration pathways and genetic structure have attracted much interest. To investigate, we quantified the genetic diversity of this species using the control region (D‐loop) and Cytochrome b (Cytb) of the mitochondrial genome. Fish were sampled (n = 91) along the Mekong tributaries from upstream to estuaries and coastal areas in the Mekong Delta and compared to three samples from Pakse (Laos). The D‐loop haplotype (0.941 ± 0.014) and nucleotide diversity (0.0083 ± 0.0005) were high in all populations, but that of Cytb was low (0.331 ± 0.059 and 0.00063 ± 0.00011, respectively). No genetic difference was detected between populations, indicating strong gene flow and confirming a long migration distance for this species. Pangasius krempfi was not genetically structured according to geographical populations but was delineated into three haplogroups, suggesting multiple genetic lineages. The presence of haplogroups in each sampling location implies that migration downstream is random but parallel when the fish enter two river tributaries bifurcating from the main Mekong River. Individuals can also migrate along the coast, far from the estuaries, suggesting a longer migration path than previously reported, which is crucial for maintaining diverse genetic origin and migration pathways for P. krempfi.


| INTRODUC TI ON
The Mekong River, one of the world's longest rivers, flowing from the Tibetan plateau in China to Vietnam, has the second-highest diversity of fish in the world after the Amazon (Durand et al., 2022;Kang & Huang, 2021;Valbo-Jørgensen et al., 2009). The complicated climatic and geological history of the Mekong River influences the current phylogeographical structure and distribution of its fauna (Sodhi et al., 2004). Among the 1136 Mekong fish species, diadromous fishes (fish that migrate between fresh and marine waters for reproduction) account for only 61 species but are disproportionately economically important (Vu et al., 2020). Diadromous fishes, particularly anadromous species, have wide range of ancestral origins (Bloom & Lovejoy, 2014;Delgado & Ruzzante, 2020;McDowall, 1997) and migration strategies (McDowall, 1997;. Compared to genetic origin and migration variations between species, within-species variation has been less documented. Fang & Chaux, 1949, family Pangasiidae, is an economically and ecologically important anadromous fish in the Lower Mekong River basin (Baird, 1996;Poulsen et al., 2004). This species has the highest market price compared to other catfish because of its excellent flesh quality (Le & Duong, 2018;Vu et al., 2020), making it the main target for capture fisheries along the Mekong River basin in Laos, Cambodia, and Vietnam (Baird, 1996;Hogan et al., 2007;Roberts & Baird, 1995;Vu et al., 2020). Due to overexploitation, dam building for hydropower, habitat fragmentation, water pollution, and other environmental issues, this species, together with other Mekong fish species, has been dramatically declining (Baran & Myschowoda, 2009;Dugan et al., 2010;Hogan, 2011;Ziv et al., 2012). Consequently, it has been listed as "vulnerable" on the IUCN red list since 2011 (Baird, 2011).

Pangasius krempfi
Despite its importance, the ancestral origin and life cycle of this species have not been fully investigated. P. krempfi is anadromous, spending its growth stages in the estuaries and coastal areas of the Mekong Delta, Vietnam, with adults migrating to freshwater areas upstream of the Mekong River for spawning (Baird, 1996;Hogan et al., 2007;Poulsen et al., 2004;Roberts & Baird, 1995;Sokheng et al., 1999;Tran et al., 2021;. However, there is limited information on the downstream migration of this species. Following the upstream migration of P. krempfi, spawning is believed to occur from Khone Falls up to the Thailand-Laos border, at least 720 km from the sea (Hogan et al., 2007;Roberts & Baird, 1995;Sokheng et al., 1999). Recent studies using strontium isotope ratios ( 87 Sr/ 86 Sr) and the microchemical structure of the otolith found that P. krempfi can migrate more than 1400 km upstream (Hogan et al., 2007;. However, the locations for spawning grounds of this species are still unclear, except for the one near to Khone Falls, which was confirmed by several studies (Baird, 1996;Hogan et al., 2007;Roberts & Baird, 1995;Sokheng et al., 1999;. Tran et al. (2021) proposed two additional spawning sites in the main stream that have not yet been precisely identified, based on corresponding 87 Sr/ 86 Sr profiles in otoliths of fish collected in the Mekong Delta and in water sampled along the Mekong River. Additionally, the P. krempfi spawning season can range from May to early November, as observed in mature individuals at Khone Falls in Laos (Baird, 1996;Sokheng et al., 1999).
Such a long spawning season is most likely due to multiple spawning groups returning to the spawning ground at different times. Based on possible spatial and temporal variations in spawning activities of P. krempfi (Poulsen et al., 2004;Rainboth, 1996), and its homecoming behavior , it is hypothesized that P. krempfi originating from different natal spawning sites and/ or spawning times could be from different gene pools. Accordingly, genetic divergence among migrating populations would be predicted.
We tested this prediction using maternally inherited mitochondrial DNA (mtDNA) markers as described here.
The mitochondrial non-coding control region (or D-loop) and protein-coding gene Cytochrome b (Cytb) have been commonly used to study intraspecific genetic diversity in fish (Ochoa et al., 2015;Page & Hughes, 2010;Zhu et al., 1994). In various animal taxa, including fish, the D-loop generally evolves faster than mtDNA coding genes (Broughton et al., 2001;Brown et al., 1993;Moritz et al., 1987;Page & Hughes, 2010), making it a powerful marker in detecting recent divergence within a species (Grant, 2015). Compared to the Dloop, Cytb is more conserved, as documented in many fish species, such as the Amazon catfish Brachyplatystoma platynemum (Ochoa et al., 2015), feather-back fish Chitala chitala (Mandal et al., 2012), and silver carp Hypophthalmichthys molitrix (Chen et al., 2019). The combined use of mtDNA segments with unequal mutation rates reflects a broad view of the evolutionary history of the species (Ballard & Rand, 2005;Moritz et al., 1987;Page & Hughes, 2010).
Previous studies on the genetic diversity of pangasiid catfish (family Pangasiidae) have used a single locus, for example, the D-loop for Mekong giant catfish Pangasianodon gigas (Ngamsiri et al., 2007), and Cytb for Pangasius pangasius and Pangasianodon hypophthalmus (Ha et al., 2020;Mohindra et al., 2015), and 16 S rRNA for nine pangasiid species (Na-Nakorn et al., 2006). In another study on P. krempfi, inter-simple sequence repeat (ISSR) markers were utilized to test genetic differentiation between populations from two estuaries of the Tien and Hau Rivers (the main Mekong tributaries in the Mekong Delta of Viet Nam). The differentiation between the two populations suggested that they were from separate spawning groups . This prediction should be further verified.
In this study, we used two mtDNA loci, D-loop and Cytb, for estimating the genetic diversity of P. krempfi. This approach was used to test a prediction of genetic divergence among migrating populations for this species and provide genetic evidence of its migration pathway, especially downstream, in the Mekong Delta. This information has important implications for the management of this vulnerable species.

| Ethics statement
All of the fish samples utilized in this investigation were obtained from fishermen or market vendors. The fish were dead at the time of sampling; therefore, no ethical approval was required.

| Sampling time and location
A total of 91 fish samples ( Table 1)  in the Hau River. Fish samples were also collected in coastal areas of Ca Mau (CM), representing a non-Mekong River population. Three samples from Pakse, Laos, in middle Mekong River were also used for comparison. Fish size ranged from 0.0042 to 9.9 kg (Appendix S1: Table S1). Specifically, fish from AG, CM, and Laos had large sizes (1.08-7.05 kg), while BT (0.025-0.095 kg) and TV (0.069-0.714 kg) had only smaller sized fish. ST had a mixture of large (maximum of 9.9 kg) and small-sized fish (0.0042 kg). A fin clip from each individual was collected and stored in 95% ethanol for genetic analyses.

| Genetic analysis
DNA was extracted from the fin clips using Wizard® SV Genomic DNA Purification kit (Promega). Extracted DNA was amplified for mitochondrial Cytb and D-loop genes. Cytochrome b was amplified using a universal primer pair L15803/H16461 (Briolay et al., 1998) with PCR composition and thermal cycle conditions based on Tsigenopoulos and Berrebi (2000). The D-loop region was amplified using the primer pair of D-loop-Thr-F/D-loop-Phe-R with the thermal cycle conditions described by Cheng et al. (2012). PCR products were purified using Wizard SV Gel and PCR Clean-Up System (Promega) and then sent for DNA sequencing at Apical Scientific Sdn. Bhd, where Sanger sequencing was conducted using ABI PRISM 3730xl Genetic Analyzer (Applied Biosystems).

| Data analysis
Sequences of D-loop and Cytb were aligned using the ClustalW function in MEGA 7 (Kumar et al., 2016). Ambiguous bases were double-checked for their sequencing quality in Finch TV version 1.4.0 (Geospiza, Inc.; http://www.geosp iza.com), edited, and trimmed before further analyses. The fitting nucleotide substitution models that describe the substitution patterns were tested using the statistical model of maximum likelihood, implemented in MEGA 7. Based on the lowest Bayesian information criterion (BIC) scores (Posada & Buckley, 2004), the best nucleotide substitution model was Tamura 3-parameter model (Tamura, 1992) for D-loop and the Kimura-2 parameter model (Kimura, 1980) for Cytb. These models were used to calculate genetic distances within and between groups and the results were the same. For simplicity, only genetic distances based on the Kimura- (Nei, 1987), and nucleotide diversity [π]) (Nei, 1987;Nei & Li, 1979)  Team, 2020)). Outputs of genetic analyses (i.e., genetic diversity, genetic distances, haplotypes, and phylogenetic trees) based on the three sequence datasets were compared.
Population structure was tested with sequence data and haplotype data. Differentiation between pairwise populations was estimated by F ST and tested for their significance with 5000 permutations in Arlequin ver. 3.5 (Excoffier & Lischer, 2010).
Arlequin was also employed to partition genetic variation within and between populations. In addition, genetic distances based on the number of nucleotide differences and Tamura 3-parameter were calculated with 1000 bootstrap replicates using MEGA7. The historical demographic expansion was evaluated using Tajima's D (Tajima, 1989) and Fu's Fs tests (Fu, 1997) and mismatch distribution analysis (Excoffier, 2004;Rogers & Harpending, 1992), which were conducted in Arlequin 3.5.

| Genetic structure of P. krempfi
Pairwise K2P genetic distances between populations varied from 0.71% to 0.93% based on D-loop (Table 2), corresponding to 6.3 and 8.3 nucleotide differences, and from 0.04% to 0.13% based on Cytb, with 1 or 2 nucleotide differences ( Table 3)   Haplogroup 2 (4 haplotypes and 10 sequences) (Figure 4). The mean genetic distance between haplogroups (ranged from 1.169% to 1.552%, Table 6) was 1.63-fold higher than that between populations. The nucleotide differences between haplogroups are from 10.5 to 13.8. Pairwise F ST among these three haplotypes groups are high, from 0.707 to 0.768 (mean F ST : 0.754, p < .01).

| Haplotype diversity and structure of P. krempfi
Two (out of 6) Cytb haplotypes were shared by all the MD populations ( Figure 5), of which H_2 was the most common (accounting

| Historical demography
Neutrality tests (Table 7)  For the three haplogroups of the D-loop (Table 8)

| DISCUSS ION
The analyses of mtDNA control region and Cytb indicated relatively high haplotype and nucleotide genetic diversity, and significant genetic differentiation within and among migratory populations of

Sum of squares % of variation Fixation index
Based on D-loop

| Genetic diversity
Based on the categories of genetic diversity suggested by Grant and Bowen (1998)  However, high levels of mtDNA genetic diversity do not necessarily indicate large population size (Bazin et al., 2006). P. krempfi has been classified "vulnerable" since 2011 (Baird, 2011) and has recently encountered increased challenges, mostly as a result of dam barriers impeding migration routes, overexploitation, and habitat degradation (Nuon et al., 2020;Ziv et al., 2012). A survey in 2017 along estuary areas in the Mekong Delta reported that interviewed fishermen observed a decline in catch and smaller harvest sizes of P. krempfi (Le & Duong, 2018). Therefore, management efforts should be prioritized for this vulnerable species.

| Genetic structure of P. krempfi reveals its migration and historical origin
All populations of P. krempfi in the Mekong Delta shared common haplotypes for the D-loop and Cytb, and had low, nonsignificant genetic difference parameters, including K2P genetic distances and F ST , indicating that gene flow is high among these populations.
These results align with the migratory behavior of the species (Hogan et al., 2007;Poulsen et al., 2004). Additionally, the sharing of D-loop haplotypes and nonsignificant genetic differences (F ST ) between Laos and Vietnamese populations (D-loop and Cytb) suggests a long migration distance for the species with a projected longest path (based on samples collected in the present study) from Paske (Laos) to Ca Mau (Vietnam) of approximately 1070 km. This genetic evidence is consistent with findings based on isotope analysis and microchemical signals of the otolith, which showed that P. krempfi can migrate at least 720 km (Hogan et al., 2007), or even farther, beyond 1400 km (Tran et al., 2021;. Previous studies (Hogan et al., 2007;Tran et al., 2021) estimated migration distances from the upstream Mekong River in Laos to estuaries. In the present study, we found no significant genetic difference between CM fish, far from the estuaries, and those from the Mekong River and Laos (Table 4 and Figure 2), suggesting a longer migration path than previously reported. Like P. krempfi, other pangasiid species have long migratory behaviors, resulting in genetic homogeneity among populations, such as in Pangasius bocourti, Pangasianodon hypophthalmus (So et al., 2006), and Pangasianodon gigas (Ngamsiri et al., 2007).

TA B L E 6
Genetic distances (mean ± SE) based on Kimura-2 parameter % (below diagonal) and nucleotide differences (above diagonal) among three D-loop haplogroups.

F I G U R E 5
Median-joining network of six Cytb haplotypes from five MD populations and three samples in Laos. The red numbers indicate mutation sites. Color coding for populations and other notes are similar to those in Figure 3.
Mekong River. When fish migrate from the Mekong River downstream to Phnom Penh (Cambodia), about 330 kilometers from the sea, where the river splits into two (Pantulu, 1986;Van Zalinge et al., 2003), they can enter either the Bassac River (which flows in the same direction as the Hau River in Vietnam) or the Mekong River (the same flow as the Tien River). Even though the two rivers are linked by the Vam Nao River (An Giang), downstream migration in the two rivers can be parallel, which is supported by the finding that the ST population in the Hau River was less genetically related (based on both D-loop and Cytb) to the Tien River's other two populations (BT and TV). This result is consistent with a previous study using ISSR markers, which found a low genetic structure between the two fish groups in Tien and Hau estuaries .
Moreover, each sampling location contained all three haplogroups, implying that migration downstream is random as the fish enters the two river tributaries bifurcating from the Mekong River.
Interestingly, the phylogenetic trees (Neighbor-Joining tree and Median-joining haplotype network) constructed based on the D-loop show that P. krempfi was not genetically structured by geographic populations but by three haplogroups (Figure 4).  Figure S2) or in size classes adjusted by sampling years (Appendix S2: Figure S3).
These analyses indicate that the finding of multiple genetic lineages of P. krempfi populations would be robust. Previous studies based on morphological observation hypothesized that P. krempfi distributed along the Mekong River basin could include two distinct populations or species (Poulsen et al., 2004;Rainboth, 1996). Based on three distinct strontium isotopes in the otoliths, Tran et al. (2021) suggested that P. krempfi has three natal origins in freshwater areas along the Mekong River. Although the natal spawning sites mentioned in that study should be further verified, the main conclusion on multiple origins agrees with our findings and those from another ISSR markers study . were also under population expansion, inferred from the mitochondrial 16S rRNA gene. The expansion period for pangasiid catfish species was estimated from 0.08 to 1.44 Mya (Na-Nakorn et al., 2006), while many other fishes were from 0.02 to 0.20 Mya, before or during the last glacial maximum (Grant, 2015). The existence of multiple haplogroups could be a result from the history of Mekong which was previously connected with other river systems or fragmented into few different paleo rivers (Voris, 2000;Woodruff, 2010;Zhang et al., 2019). Other freshwater fishes in the Southeast Asian region such as chevron snakehead fish Channa striata and tire track eel Mastacembelus favus were also reported to comprise of different lineages, which were explained by the geographic changes during Pleistocene (Adamson et al., 2012;Jamaluddin et al., 2019). Thus, all of the genetic patterns observed in previous and current studies can be attributed to the Mekong River's history of fragmentation and amalgamation (Jamaluddin et al., 2019;Takagi et al., 2011).

| Comparing D-loop and Cytb
In many fish taxa, the D-loop has higher mutation rates compared to other mtDNA genes such as Cytb (Broughton et al., 2001;Brown et al., 1993;Mcmillan & Palumbi, 1997;Zhu et al., 1994). This is also true for P. krempfi as its D-loop has a 13.2-fold greater nucleotide diversity than Cytb (Table 1). Due to its conserved nature, genetic differentiation (F ST ) based on Cytb across populations (Table 4) was larger than that based on the D-loop. Similarly, higher within-population nucleotide diversity and smaller betweenpopulation genetic differences were inferred from D-loop compared to Cytb. This has also been reported for other freshwater fish species such as the feather-back fish Chitala chitala (Mandal et al., 2012), snakehead Channa striata , silver carp Hypophthalmichthys molitrix (Chen et al., 2019). Although the two sequence types differ in evolution rates, we found that, in general, they both yielded positive relationships in nucleotide diversity (Table 1), K2P genetic distances (Figure 2), and F ST (Table 4).
However, the positive correlation in pairwise K2P genetic distance between D-loop and Cytb is different between two comparison groups, in which pairwise testing showed that Laos-MD populations had larger genetic distances compared to populations within the MD.
To date, few studies have focused on comparing genetic distance estimates from D-loop and Cytb within a fish species. Some studies analyzed concatenated segments of the two sequences (Ju et al., 2021;Nazia et al., 2010) but did not mention the results for a single locus. At an interspecific level, Zhu et al. (1994) found that the sequence divergence among sister species of genus Melanotaenia (rainbowfish in Australia), estimated from Dloop and Cytb, was positively correlated. Page  Retropinna (cucumberfish, family Retropinnidae). Like the above studies, positive relationships between D-loop and Cytb were found in estimates for all genetic parameters of P. krempfi. These concordant outputs from D-loop and Cytb analyses strengthen our findings.
In conclusion, P. krempfi has moderate-to-high levels of genetic diversity. Low genetic differences and haplotype sharing across populations confirm a long migration pathway from the upstream Mekong River in Laos to estuaries and coastal areas in the Mekong Delta. Three high-divergence haplogroups suggest multiple genetic lineages for this species with variable historical demography.
The findings from the current study can be tested in a further work with more samples and more upstream sampling locations along the Mekong River, especially between Vietnam and Laos.
In addition, mtDNA sequence variation alone may not represent the precise or comprehensive evolutionary history of P. krempfi.
Biasness in mtDNA evolution could also occur. Thus, it would be prudent to include other unlinked genetic markers, such as microsatellite, single-nucleotide polymorphism (SNP), and restriction siteassociated DNA (RAD) sequencing in future studies.  Jamaluddin for valuable comments on the revision. We thank anonymous reviewers for constructive comments to improve the manuscript.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare no conflicts of interest.